Journal of Graphics

Generative model based unsupervised multi-view stereo network

Yuxuan PAN , Rui JIN , Yu LIU , Lin ZHANG

Journal of Graphics. 2026, 47(1): 29 -38.

Existing research on multi-view stereo scheme utilizes depth-estimation algorithms to achieve stereo representation by establishing a mapping relationship between the physical and digital worlds. Supervised learning-based neural networks have achieved accurate and high-fidelity 3D reconstruction results through training. However, in-the-wild visual reconstruction remains challenging due to the lack of rendered depth priors and wide-baseline characteristics of images. A novel system was proposed to obtain optimized depth for naturally collected multi-view images without prior information by applying an unsupervised learning network and semantically optimized Neural Radiation Field (NeRF) rendering. First, preliminary depth information for wild multi-view images were produced without ground truth based on unsupervised deep learning. Subsequently, in a separate NeRF module, a diffusion model was used to construct a surface semantic rendering loss, enabling a fine-grained volumetric representation. Experimental results on the benchmark dataset validated the performance of the proposed system by improving an average of 24.6% of the overall metrics, compared with other state-of-the-art schemes. A novel wild wide-baseline dataset was also applied to verify the generalization performance, and the proposed system reduced the reconstruction error by up to 40.8% compared with all methods.

A mixed-precision quantization method for large language models via memory alignment

Zhangming LI , Weifan GUAN , Zhengwei CHANG , Linghao ZHANG , Qinghao HU

Journal of Graphics. 2026, 47(1): 39 -47.

As large models continue to grow in scale, the memory footprint and computational overhead of model inference have become critical challenges. Mixed-precision quantization is an effective approach to reduce resource consumption, but existing methods suffer from insufficient outlier handling, significant quantization accuracy loss, and inefficient memory access. To address these issues, a memory-aligned mixed-precision quantization method for large models was proposed. First, weights were divided into SIMD-aligned groups, and outlier groups were identified via group-wise significance analysis, with high-significance groups quantized to 8 bit and others to 2 bit. A block-wise compensation strategy was introduced to mitigate accuracy degradation caused by 2 bit quantization. Furthermore, an efficient packing and storage scheme was designed for mixed-precision weights, where a bitmap was used to record the bit width of each data block, enabling random access. Experimental results demonstrated that the proposed method significantly reduced memory usage and improved computational efficiency while maintaining model accuracy. Specifically, on Llama2-7 B/13 B/70 B, the approach achieved perplexity reductions of 8.13/2.84/1.37 on WikiText-2 and 5.80 on C4 relative to state-of-the-art baselines. The quantized 70 B model reduced weight storage by approximately 87% compared with BF16. Across seven QA benchmarks, an average accuracy gain of 6.24% was achieved. Last, these results indicated that a mixed-precision quantization method for large language models via memory alignment could simultaneously improve compression ratio, memory-access efficiency, and overall model performance.

MBSE-based conceptual design method for complex forming equipment

Boya WANG , Shaozong WANG , Wanran YANG , Xingwei ZHOU , Liang HOU , Chengyue XIONG

Journal of Graphics. 2026, 47(1): 179 -193.

The traditional development approach for complex forming equipment typically relies on Document-Based Systems Engineering (DBSE), which often leads to issues such as protracted development cycles due to inadequate requirement analysis, incomplete requirement coverage caused by textual ambiguity, and equipment development delays lagging behind technological iterations. These shortcomings frequently result in final designs that fail to meet target performance metrics and require inefficient, repetitive modifications. Therefore, in the conceptual design stage of complex forming equipment, and drawing on the U.S. Department of Defense Architecture Framework (DoDAF) combined with Model-Based Systems Engineering (MBSE), an MBSE-based conceptual-design method for complex forming equipment was proposed. This method utilized five viewpoints, including panoramic viewpoint, capability viewpoint, operational viewpoint, systems viewpoint, and standards viewpoint, as entry points for the conceptual design of complex forming equipment. Through multi-perspective analysis, the method performed top-level requirements acquisition, requirements refinement analysis, functional analysis, and system modeling across four design levels. Eleven types of models were established using the Systems Modeling Language (SysML), enabling digital and procedural expression in the conceptual design stage of complex forming equipment. Finally, superplastic-forming equipment was used as a representative example to demonstrate the application of this design method. The application of the method addressed the shortcomings of traditional design approaches and demonstrated that the method provided effective guidance for the forward development of complex forming equipment.

Performance evaluation of construction site object detection under drone-captured perspective

Zhuo SONG , Dehui LU , Zhichao HUANG , Shiyu TIAN , Ronglong YAN , Yichuan DENG

Journal of Graphics. 2026, 47(1): 68 -77.

The organizational management of construction sites is a critical aspect in engineering management; however, traditional human supervision method is constrained by many environment limitations and low efficiency. In recent years, multiple government departments have issued relevant policies advocating deep integration of artificial intelligence with the real economy to promote high-quality and efficient economic development. The accuracy, efficiency, and automation advantages of Computer Vision (CV) technology have gradually led to its widespread application in the field of construction supervision. Meanwhile, the drones, which can efficiently obtain complex and varied visual data of construction scene, demonstrate their application potential in CV-based construction supervision tasks. However, the current researches on drone-based construction scene detection are limited, and the lack of overhead-perspective construction-scene image datasets restricts further development in the field. Therefore, the DJI Mavic 3T drone was utilized to obtain construction-site images to establish an open-source overhead image dataset for construction scene UB-CSD. Several advanced object-detection algorithms were selected for comparative experiments on the UB-CSD dataset, and the reasons for performance differences were analyzed from multiple dimensions such as model workflow design, computation principle, and task characteristics. The mAPs of every algorithm’s detection result were YOLOv8 and YOLOv10 (96.1%), YOLOv9 (96.0%), YOLO11 (95.7%), DETR (95.3%), Faster-RCNN (76.3%) and RetinaNet (72.1%). The analysis results indicated that the YOLO series algorithm constituted the most optical algorithm for drone-based object detection tasks in construction scenes. By establishing a new open-source special dataset and conducting comparative experiments, the conclusion drawn provided effective data and experimental cases to support future safety production management and object-detection algorithm research in the construction industry.

Defect detection of aero-engine blades based on dynamic vision sensors

Xingshun ZHANG , Haiyong CHEN

Journal of Graphics. 2026, 47(1): 120 -130.

Aeroengine blades are core components of engines; tiny surface defects can lead to serious safety accidents. Traditional vision detection technology is limited by motion blur, low dynamic range, background redundancy, and so forth. To address these challenges, a method of aeroengine blade defect detection based on Dynamic Vision Sensor (DVS) was proposed. Dynamic vision sensor produced data in an asynchronous event-stream format, and were therefore referred to as event camera, which exhibited the advantages of large dynamic range, high frame rate, and strong ability to capture small targets. Firstly, a defect detection platform based on DVS was built, and its imaging characteristics and advantages were explored. On this basis, the first Event-based Defect Detection Dataset of Aeroengine Blade (EDD-AB) dataset based on DVS was constructed, covering nearly 6 000 images of scratches, point marks and edge damage, with approximately 12 000 finely annotated target labels. The dataset was released as open source (link: https://github. com/NiBieZhouMei5520/EDD-AB.git). Furthermore, a multi-scale defect-detection algorithm based on asynchronous event-stream frame aggregation (AEAF-ABDD) was proposed: event streams were visualized through frame aggregation technology using a fixed time window; a Multi-Resolution Adaptive Feature Pyramid Network (MRAFPN) was developed to enhance multi-scale defect feature extraction capability; a lightweight SimAM attention mechanism was incorporated to strengthen focus on key regions; a star-convolution module (StarNet) was fused to improve the efficiency of high-dimensional nonlinear feature mapping, enabling accurate detection of multi-scale defects on complex curved workpieces. Experiments demonstrated that AEAF-ABDD achieved a mean Average Precision (mAP) of 97.7% on the EDD-AB dataset and a detection speed of 105 frames per second, substantially outperforming mainstream algorithms. An efficient solution for automated quality inspection of highly reflective curved workpieces was thereby provided, promoting the application of DVS in the field of industrial inspection.

Conservative enclosing box construction algorithm based on implicit geometric coding with Lipschitz linear constraints

Bingyu ZHANG , Liqun KUANG , Fengguang XIONG , Fanshu SUN , Shichao JIAO

Journal of Graphics. 2026, 47(1): 152 -161.

Currently the mainstream enveloping box methods are widely used in 3D scene rendering, ray tracing, and collision detection tasks; however, these methods suffer from the problems of low space utilization and insufficient fitting accuracy in fitting complex geometries, which are difficult to ensure strict conservatism and still have room for improvement in reducing false detection rates. To address these issues, a conservative bounding-box construction method combining implicit geometric coding and Lipschitz constraints was proposed. Implicit geometric coding mapped the input coordinates to a high-dimensional space via position coding, thus capturing local and global geometric information and improving bounding-box adaptability. A trainable Lipschitz-constrained linear layer was introduced to dynamically adjust Lipschitz constants control gradient changes, and Lipschitz regularization loss was combined with dynamically weighted cross-entropy loss to reduce the FP rate while optimizing the boundary fitting. The experimental results demonstrated that the method can achieve a false-negative rate of 0 on multiple 3D models and reduce the false-detection rate by up to 3.1% compared to the benchmark method, and improve the single-ray query method by 1.7 ms, providing a highly efficient and robust solution for high-precision conservative bounding box fitting.

BSD-YOLO: a small target vehicle detection method based on dynamic sparse attention and adaptive detection head

Biao YANG , Xue WANG , Zheng GUAN , Ping LONG

Journal of Graphics. 2026, 47(1): 99 -110.

In intelligent traffic monitoring systems, small target vehicle detection in complex scenes faces challenges such as low feature resolution, severe occlusion interference, computational redundancy, and insufficient bounding-box regression accuracy. To balance detection accuracy with deployment efficiency on edge devices, an improved YOLOv8 framework based on dynamic sparse attention and a lightweight dual-branch structure was proposed. The method first introduced a bidirectional routing sparse attention mechanism (ReBiAttention) that enhanced the retention of shallow features for small targets by dynamically filtering key features through a two-level routing strategy. Subsequently, GSConv and VoV-GSCSP modules were integrated to reduce computational cost while dynamically adjusting multi-scale feature weights. An improved DynamicHead was applied for multi-task adaptive optimization, and a modified ShapeIoU loss function with shape- and scale-aware weighting was employed to improve localization accuracy. Experiments on the UA-DETRAC dataset showed that, relative to baseline YOLOv8n, Precision, Recall, and mAP@0.5 increased by 8.739%, 1.685%, and 7.225%, respectively, while the parameter count decreased by 4.3%. This method provided an efficient solution for accurate detection of small-target vehicles in complex traffic scenarios.

Neural radiation field reconstruction based on feature point-guided interference identification

Hao REN , Shaobo LI , Mao GONG , Bo WANG

Journal of Graphics. 2026, 47(1): 111 -119.

To address the challenge of achieving high-quality 3D reconstruction with Neural Radiation Fields (NeRF) under the influence of occluding objects, a method based on the collaborative optimization of Structure-from-Motion (SfM) and the Segment Anything Model (SAM) was propose. Building upon the Scale-Invariant Feature Transform (SIFT) algorithm within the SfM reconstruction process, geometric inconsistencies in dynamic scenes were leveraged for feature point identification and matching. Unmatched feature points were treated as dynamic occluders, guiding the SAM model—capable of point-guided segmentation—to perform dynamic occluder segmentation and generate a static scene mask. Based on the segmentation results, mask-aware volumetric rendering was used to predict colors and a quadruple loss function was established: comprising reconstruction loss, structural consistency loss, adversarial loss, and self-supervised patching loss. These objectives were jointly optimized to constrain the color output in patched regions. After iterative training, consistent restoration of geometric structure and appearance in occluded areas across multiple viewpoints was achieved. The radiometric integrity was preserved while occlusions were removed. Validation on public dynamic scene datasets demonstrated that the mask-based volumetric rendering combined with joint optimization produced an average Peak Signal-to-Noise Ratio (PSNR) improvement of 5.24 dB over baseline models and mainstream occlusion removal methods, alongside a 35% reduction in Learned Perceptual Image Patch Similarity (LPIPS). This approach established a new paradigm for 3D reconstruction in complex dynamic environments.

Research on dynamic voxelization-based collision detection in construction scenarios

Hao LIN , Zhiming WU , Jilan JIN

Journal of Graphics. 2026, 47(1): 204 -215.

Among all safety accidents in construction scenarios, collision accidents are regarded as one of the most common types of injury. To effectively prevent and monitor the occurrence of collision accidents, the computer graphics analysis technology has been used to assist collision detection and analysis; however, limitations remain in balancing the real-time performance with high precision of detection. To address this, a collision-detection method based on dynamic voxelization was proposed. This method integrated the generation of dynamic spatial voxel tree with the dynamic spherical voxelization calculation of resources to construct a collision detection and analysis mechanism. The core ideas are as follows: ① Based on the crowding-degree threshold, the space was recursively divided to generate a dynamic voxel tree, effectively filtering out non-collision risk areas. ② The side length of voxel units were dynamically calculated according to the relative distance between resources and resource volume, realizing the adaptive adjustment of voxel granularity. ③ Spherical voxels were used instead of traditional cubic voxels to avoid the computational burden of non-axis-aligned detection. ④ A hollowing-out procedure was introduced to eliminate internal invalid voxels, further optimizing detection efficiency. This method can accurately capture resource interactions in complex dynamic construction environments, significantly improving detection accuracy and optimizing computational efficiency. Experimental results showed that compared with traditional methods, the proposed method significantly improved the detection accuracy, with precision and accuracy reaching 94.64% and 96.67%, respectively. In terms of collision detection time, it was more efficient than most existing methods, with a calculation speed increase of at least about 11.36%. At the same time, the study analyzed the impact of key parameters such as voxel-tree depth, root-node size, and voxel side length on performance, and analyzed the consumption of CPU resources and memory resources by the method in scenarios of different scales. The consumption was within an acceptable range, verifying the applicability of the method in construction scenarios. The method provided an effective new idea of information processing for enhancing the intelligent level of construction safety management.

A dynamic pruning approach for cross-domain few-shot image generation

Shiliang LI , Qiang FANG , Yihua WANG , Yifei SHI , Zhuo WANG , Zeyu LI , Yunfei XIE , Jia WANG

Journal of Graphics. 2026, 47(1): 131 -142.

Few-shot image generation has important application value in fields such as medical imaging and artistic creation. In recent years, significant research progress has been made in this task, with mainstream approaches typically relying on transferring generative models pretrained on large-scale source domain datasets to target domains to mitigate data-scarcity challenges. However, when substantial semantic gaps exist between source and target domains, direct transfer often introduced incompatible source-specific features, degrading image realism and style consistency. Although existing methods have removed redundant features via static pruning strategies, such as fixed-threshold filter pruning, they struggle to adapt to the dynamic evolution of features across different layers of deep networks, often resulting in the mistaken removal of general low-level features while retaining redundant high-level ones, thereby affecting the adaptation performance and generation quality of the model. To address this, a dynamic pruning method based on filter-importance estimation was proposed. Specifically, the method continuously tracked the changes in Fisher information of each layer’s filters during training to evaluate their importance for image generation quality. Based on the Fisher information, a cumulative importance weight-based adaptive pruning mechanism was constructed to dynamically determine the pruning ratio for each layer, enabling more precise removal of redundant or incompatible filters while preserving general structural semantic information. Experiments were conducted on several representative few-shot target domains, and results showed that the proposed method significantly outperformed existing approaches in terms of image quality (Frechet Inception Distance, FID) and image diversity (Intra-domain Learned Perceptual Image Patch Similarity, Intra-LPIPS). In target domains exhibiting significant semantic differences from the source domain, the proposed method achieved superior FID scores compared with the current state-of-the-art methods, demonstrating its stability and superiority for cross-domain few-shot image generation tasks.